12 research outputs found

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    No full text
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    The accuracy versus interpretability trade-off in fraud detection model

    No full text
    International audienceLike a hydra, fraudsters adapt and circumvent increasingly sophisticated barriers erected by public or private institutions. Among these institutions, banks must quickly take measures to avoid losses while guaranteeing the satisfaction of law-abiding customers. Facing an expanding flow of operations, effective banking relies on data analytics to support established risk control processes, but also on a better understanding of the underlying fraud mechanism. In addition, fraud being a criminal offence, the evidential aspect of the process must also be considered. These legal, operational, and strategic constraints lead to compromises on the means to be implemented for fraud management. This paper first focuses on the translation of practical questions raised in the banking industry at each step of the fraud management process into performance evaluation required to design a fraud detection model. Secondly, it considers a range of machine learning approaches that address these specificities: the imbalance between fraudulent and nonfraudulent operations, the lack of fully trusted labels, the concept-drift phenomenon, and the unavoidable trade-off between accuracy and interpretability of detection. This state-of-the-art review sheds some light on a technology race between black box machine learning models improved by post-hoc interpretation and intrinsic interpretable models boosted to gain accuracy. Finally, it discusses how concrete and promising hybrid approaches can provide pragmatic, short-term answers to banks and policy makers without swallowing up stakeholders with economical and ethical stakes in this technological race

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    No full text
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional bio-statistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant diversity in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this paper proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow is intended to be (1) generic, as it is suitable regardless of the data volume (small/big) and regardless of the nature of the variables (continuous/qualitative/mixed), (2) easy to implement, as it is based on few easy-to-use software packages, and (3) robust, through the stability evaluation of the final clusters and through recognized algorithms and implementations. This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Benefits and risks of noninvasive oxygenation strategy in COVID-19: a multicenter, prospective cohort study (COVID-ICU) in 137 hospitals

    No full text
    International audienceAbstract Rational To evaluate the respective impact of standard oxygen, high-flow nasal cannula (HFNC) and noninvasive ventilation (NIV) on oxygenation failure rate and mortality in COVID-19 patients admitted to intensive care units (ICUs). Methods Multicenter, prospective cohort study (COVID-ICU) in 137 hospitals in France, Belgium, and Switzerland. Demographic, clinical, respiratory support, oxygenation failure, and survival data were collected. Oxygenation failure was defined as either intubation or death in the ICU without intubation. Variables independently associated with oxygenation failure and Day-90 mortality were assessed using multivariate logistic regression. Results From February 25 to May 4, 2020, 4754 patients were admitted in ICU. Of these, 1491 patients were not intubated on the day of ICU admission and received standard oxygen therapy (51%), HFNC (38%), or NIV (11%) ( P < 0.001). Oxygenation failure occurred in 739 (50%) patients (678 intubation and 61 death). For standard oxygen, HFNC, and NIV, oxygenation failure rate was 49%, 48%, and 60% ( P < 0.001). By multivariate analysis, HFNC (odds ratio [OR] 0.60, 95% confidence interval [CI] 0.36–0.99, P = 0.013) but not NIV (OR 1.57, 95% CI 0.78–3.21) was associated with a reduction in oxygenation failure). Overall 90-day mortality was 21%. By multivariable analysis, HFNC was not associated with a change in mortality (OR 0.90, 95% CI 0.61–1.33), while NIV was associated with increased mortality (OR 2.75, 95% CI 1.79–4.21, P < 0.001). Conclusion In patients with COVID-19, HFNC was associated with a reduction in oxygenation failure without improvement in 90-day mortality, whereas NIV was associated with a higher mortality in these patients. Randomized controlled trials are needed

    Extracorporeal Membrane Oxygenation for Severe Acute Respiratory Distress Syndrome Associated with COVID-19: An Emulated Target Trial Analysis

    No full text
    International audienc

    Predicting 90-day survival of patients with COVID-19: Survival of Severely Ill COVID (SOSIC) scores

    No full text
    International audienceBackground Predicting outcomes of critically ill intensive care unit (ICU) patients with coronavirus-19 disease (COVID-19) is a major challenge to avoid futile, and prolonged ICU stays. Methods The objective was to develop predictive survival models for patients with COVID-19 after 1-to-2 weeks in ICU. Based on the COVID–ICU cohort, which prospectively collected characteristics, management, and outcomes of critically ill patients with COVID-19. Machine learning was used to develop dynamic, clinically useful models able to predict 90-day mortality using ICU data collected on day (D) 1, D7 or D14. Results Survival of Severely Ill COVID (SOSIC)-1, SOSIC-7, and SOSIC-14 scores were constructed with 4244, 2877, and 1349 patients, respectively, randomly assigned to development or test datasets. The three models selected 15 ICU-entry variables recorded on D1, D7, or D14. Cardiovascular, renal, and pulmonary functions on prediction D7 or D14 were among the most heavily weighted inputs for both models. For the test dataset, SOSIC-7’s area under the ROC curve was slightly higher (0.80 [0.74–0.86]) than those for SOSIC-1 (0.76 [0.71–0.81]) and SOSIC-14 (0.76 [0.68–0.83]). Similarly, SOSIC-1 and SOSIC-7 had excellent calibration curves, with similar Brier scores for the three models. Conclusion The SOSIC scores showed that entering 15 to 27 baseline and dynamic clinical parameters into an automatable XGBoost algorithm can potentially accurately predict the likely 90-day mortality post-ICU admission (sosic.shinyapps.io/shiny). Although external SOSIC-score validation is still needed, it is an additional tool to strengthen decisions about life-sustaining treatments and informing family members of likely prognosis

    Characteristics and prognosis of bloodstream infection in patients with COVID-19 admitted in the ICU: an ancillary study of the COVID-ICU study

    No full text
    International audienceBackground Patients infected with the severe acute respiratory syndrome coronavirus 2 (SARS-COV 2) and requiring intensive care unit (ICU) have a high incidence of hospital-acquired infections; however, data regarding hospital acquired bloodstream infections (BSI) are scarce. We aimed to investigate risk factors and outcome of BSI in critically ill coronavirus infectious disease-19 (COVID-19) patients. Patients and methods We performed an ancillary analysis of a multicenter prospective international cohort study (COVID-ICU study) that included 4010 COVID-19 ICU patients. For the present analysis, only those with data regarding primary outcome (death within 90 days from admission) or BSI status were included. Risk factors for BSI were analyzed using Fine and Gray competing risk model. Then, for outcome comparison, 537 BSI-patients were matched with 537 controls using propensity score matching. Results Among 4010 included patients, 780 (19.5%) acquired a total of 1066 BSI (10.3 BSI per 1000 patients days at risk) of whom 92% were acquired in the ICU. Higher SAPS II, male gender, longer time from hospital to ICU admission and antiviral drug before admission were independently associated with an increased risk of BSI, and interestingly, this risk decreased over time. BSI was independently associated with a shorter time to death in the overall population (adjusted hazard ratio (aHR) 1.28, 95% CI 1.05–1.56) and, in the propensity score matched data set, patients with BSI had a higher mortality rate (39% vs 33% p = 0.036). BSI accounted for 3.6% of the death of the overall population. Conclusion COVID-19 ICU patients have a high risk of BSI, especially early after ICU admission, risk that increases with severity but not with corticosteroids use. BSI is associated with an increased mortality rate
    corecore